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ABSTRACT 

We investigate how well the redshift distributions of galaxies sorted into photometric redshift bins can 
be determined from the galaxy angular two-point correlation functions. We find that the uncertainty 
in the reconstructed redshift distributions depends critically on the number of parameters used in each 
redshift bin and the range of angular scales used, but not on the number of photometric redshift bins. 
Using six parameters for each photometric redshift bin, and restricting ourselves to angular scales over 
which the galaxy number counts are normally distributed, we find that errors in the reconstructed 
redshift distributions are large; i.e., they would be the dominant source of uncertainty in cosmological 
parameters estimated from otherwise ideal weak lensing or baryon acoustic oscillation data. However, 
either by reducing the number of free parameters in each redshift bin, or by (unjustifiably) applying 
our Gaussian analysis into the non-Gaussian regime, we find that the correlation functions can be used 
to reconstruct the redshift distributions with moderate precision; e.g., with mean redshifts determined 
to ~ 0.01. We also find that dividing the galaxies into two spectral types, and thereby doubling the 
number of redshift distribution parameters, can result in a reduction in the errors in the combined 
redshift distributions. 

Subject headings: cosmology: theory - cosmology: observation 



1. INTRODUCTION 

There are many different techniques for determining the 
distance-redshift relation and/or growth-redshift relation 
motivated by the desire to understand the dark energy. 
Those that rely on the distances to a relatively small num- 
ber of objects, such as the Type la supernova method (e.g. 
Riess et al., 1998), can use spectroscopic redshift deter- 
minations and thus avoid redshift error as a significant 
source of uncertainty. However, when the distance (and/or 
growth) constraints are derived from measurement of very 
large numbers of objects spectroscopy can be a practical 
impossibility. In such cases one must rely on "photomet- 
ric redshifts"; i.e., redshifts estimated from photometry in 
multiple broad bands (e.g. Loh & Spillar, 1986; Connolly 
et al, 1995; Sawicki et al., 1997). 

The relatively low cost per object of imaging surveys 
compared to spectroscopic surveys is a great advantage 
and provides significant motivation for pursuing the tech- 
nique of estimating photometric redshifts. Imaging sur- 
veys can potentially constrain dark energy via a variety of 
techniques including cluster counting (e.g. Haiman et al., 
2001), cosmic shear (e.g. Hu, 2002; Huterer, 2002) and 
baryon acoustic oscillations (BAO) (e.g. Seo & Eisenstein, 
2003; Blake & Glazebrook, 2003; Padmanabhan et al., 
2006). It may even be possible for imaging surveys to 
use Type la supernovae, without spectroscopic follow-up, 
to constrain cosmology (Barris & Tonry, 2004). 

But abandoning spectroscopy has its disadavantages too. 
In general, there is some tolerance of redshift error, but 
less tolerance for uncertainty about the probability dis- 
tribution of those errors. The impact of redshift uncer- 
tainties on dark energy constraints has been studied for 
supernovae (Huterer et al., 2004), cluster number counts 



(Huterer et al., 2004), weak lensing (Bernstein & Jain, 
2004; Huterer et al., 2006; Ishak, 2005; Ma et al., 2006) 
and baryon oscillations (Zhan & Knox, 2005; Zhan, 2006). 

All of the studies cited in the above paragraph model 
the error distribution as Gaussian. However, photometric 
redshift error distributions, due to spectral- type/redshift 
degeneracies, often have bimodal distributions, with one 
smaller peak separated from a larger peak by Az of or- 
der unity (e.g. Benftez, 2000; Fernandez-Soto et al., 2001, 
2002). Thus a fraction of galaxies have photometric red- 
shifts that are 'catastrophically' wrong. Here we study 
how well the coarse properties of the true redshift distri- 
bution of galaxies in a given photometric redshift bin can 
be reconstructed from galaxy two-point correlation func- 
tions. 

The idea is that catastrophic photometric redshift ( "photo- 
2" ) errors introduce additional correlations between galax- 
ies in different redshift bins. In general, such errors will 
alter both the amplitude and shape of the binned angu- 
lar correlation functions. Measurements of the correlation 
functions over a range of angular scales would thus provide 
valuable information to unravel the effects of large photo-z 
errors. 

We emphasize that we are not attempting a forecast of 
the photo-z errors achievable given all possible informa- 
tion. In particular, we neglect information from spectro- 
scopic calibration of the photo-z error distribution. Spec- 
troscopy, possibly combined with a "super" photometric 
(12 or more bands) photo-z training set will play a critical 
role^. In this sense our forecasts here are highly conser- 
vative. Further, the catastrophic errors are likely to be 
avoided by use of luminosity function and surface bright- 

^The current plan for photo-^; calibration is described in 
http: / /www. lsst.org/Science/photo-z-plan.pdf. 
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ness priors. For recent results on spectroscopic calibration 
of photo- a: measurements for weak lensing, see Ilbert et al. 
(2006). 

The outline of this paper is as follows. In § 2 we describe 
our model for the photo-2 errors, the Fisher matrix we 
use to constrain the parameters of this model, and our 
model for the galaxy angular power spectra. We present 
our results in § 3, including the details of our fiducial model 
and its impact on the resulting Fisher matrix constraints. 
Wo disciiss some implications of our results in § 4 and draw 
conclusions on the feasibility of constraining photo-z errors 
with galaxy angular correlation functions. 

2. METHOD 

In this section, we first introduce our model for the 
catastrophic photo-z errors, and then describe the Fisher 
matrix formalism (Jungman et al., 1996; Tegmark et al., 
1997) that we use to forecast how well the parameters 
of this error model can be constrained from observations 
of the galaxy angular correlation function, binned in red- 
shift. We restrict ourselves to forecasting hero and leave 
for later work the development of a practical algorithm for 
constraining the photo-z errors in a galaxy survey. 

2.1. Model for catastrophic photo-z errors 

To focus on the gross mislabeling of galaxy redshifts in- 
troduced by catastrophic photo-z errors, we bin the galaxy 
distribution in redshift and model the errors as a linear 
mixing of the values of the galaxy number density in each 
bin. In terms of this model, our goal is then to constrain 
the number of galaxies from each true- z bin that contribute 
to the observed number in a given photo-z bin. 

We assign the same numerical values for redshift inter- 
vals to photo-z bins and true-z bins. The parameters of 
our error model are then defined as^ : 

TVj?, = mean number of galaxies per stcradian of 

spectral-type a in photo-z bin i that come from 
true-z bin a. 

By considering only the mean number of galaxies mix- 
ing between photo-z bins, we are ignoring possible angular 
fluctuations in the mixing. We expect this to be a good 
approximation on scales large enough for the fractions of 
different galaxy types to be uniform, and in the limit of 
homogeneous noise. The separate index for galaxy sub- 
populations is to allow for the possibility of different photo- 
z errors for different galaxy spectral types. However, for 
most of our results we ignore any information about gjilaxy 
types and consider just the parameters Nia = J2a ^ia 
the entire sample of galaxies. 

Using these parameters, we construct the redshift dis- 
tribution of galaxies in photo-z bin i, 



a top-hat window function defining the true-z bin a, and 



dNl_ 
dz dfl 
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where dN"' / dzdVl is the number of galaxies of spectral-type 
a in redshift interval dz and angular interval dfi, ^->a{z) is 

^Wc will use latin indices in the beginning of the alphabet for the 
galaxy sub-populations, latin indices in the middle of the alphabet 
for the observed redshift bins, and greek indices for the true bins 
(the range of the two indices may be different in general) . 



N°- = - 



dn 



is the mean number of galaxies (per steradian) in true-z 
bin a. 

Because we arc binning in redshift, our model cannot tell 
us anything about the shape of the redshift distribution 
within each bin (given by the term in parentheses in eq. 1). 
We therefore assume that this is known. However, the 
normalization of the redshift distribution in each bin is 
determined by the parameters N""^. 

If we integrate eq. 1 over redshift, we get an expression 
for the total number of galaxies (per steradian) in each 
photo-z bin, 



AT" 



(3) 



We take the set of Nf{6) as our data set that we use to 
constrain the mean parameters N?'^. 

2.2. Fisher matrix 

Rephrased in more abstract terms, our problem is to 
figure out how well a set of parameters {ap} can be con- 
strained from a data set {Nf}, through the influence of ap 
on the statistical properties of the data. 

On large scales, the values of Nf are Gaussian dis- 
tributed and their statistical properties are completely de- 
scribed by the mean, iV", and covariance, w°j'(6', ap). How- 
ever, on small scales, where nonlinear clustering becomes 
important, the galaxy number density becomes significantly 
non-Gaussian and higher-order correlations are required to 
completely describe the statistics of the density field. To 
avoid the complexities of non-Gaussianity, we limit our- 
selves to a range in 9 where the data is Gaussian to a good 
approximation (see section 3.1 for details). Note, however, 
that there is more information in the data in smaller scales 
than we are considering here, which could improve the pa- 
rameter constraints beyond those shown below. 

For Gaussian data, the Fisher matrix is given by (Tegmark 
et al., 1997), 



- pp 
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where N is the mean of the data and w is the covariance 
matrix of the data, defined by, 

N = N (5N, (5) 

with N = {N^}, and 

w = (5N5N^) (6) 

so that w ~ \^wfj{d, flp)}. In the quadratic approximation 
to the likelihood, the inverse Fisher matrix is then equal 
to the covariance of the parameters Up. 

In this paper, we consider two parameter sets for {ap}. 
First, we calculate the Fisher matrix for the parameters 
{N°^}. We then add the linear galaxy bias with respect to 
the dark matter in each redshift bin ({6^}) and the ampli- 
tude of the power spectrum at z; = (which is degenerate 
with the galaxy bias at z = 0) as additional parameters. 

The expression for the Fisher matrix in eq. 4 simpli- 
fies considerably when we Fourier transform over the vari- 
able 6. In this case, the covariance matrix becomes block- 
diagonal (with one block for each value of the conjugate 
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variable £) due to isotropy. The second term in eq. 4 then 
becomes. 



{ai){ai)' 



{ai){ai)' 



da 



(C-) 



dC, 



'p 



(ai)'(fcj)' 



(bjyjbj) 



(C-) 



(6i)(ai) 



(7) 

where /sky is the fractional angular area of the sky cov- 
ered by the survey, C(^ai){bj){^) is the Fourier transform of 
w1^{9) and we have written out the matrix multiplications 
explicitly^. 

When we Fourier transform the first term in eq. 4, only 
the monopole = 0) contributes to the mean so that. 



E 

(ai),(bj) 



da„ 



(C-i(£ = 0)) 



dNl 

{«-i)(bj) dap' 
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This term tells us what information the mean of th^ data 
contributes to the parameter constraints. Because N does 
not depend on the galaxy bias, eq. 8 is nonzero only for 
the parameters {N^^}. 

While formally the cosmological monopole of the power 
spectrum cannot be determined (because of unknown con- 
tributions from super- horizon-sized modes), for a survey 
that covers only a fraction of the sky, this term will be 
aliased and will contain contributions from the power spec- 
trum at small (but nonzero) £. We therefore define an 
"effective" monopole variance. 



C(ai){bj){e = 0) = Sf^^)(b:i) {'^U)) 



(9) 



From the predicted amplitude of the angular power spec- 
trum at small £ we set cr^^-^ = 10~^ x N^. We consider 
this a conservative upper bound. 

To avoid confusion in the strict interpretation of this 
term, we point out that it is identical in form to adding 
extra Fishier information from an independent measure- 
ment of N. 

Note that, in our notation. Ma ct al. (2006) and Zhan 
(2006) have omitted the term in eq. 8 and instead fixed 
Na (= Si-^ia)- By letting vary freely, we are assum- 
ing no prior knowledge of the true redshift distribution. 
And by including the term in eq. 8, we are taking into 
account the fact that the mean number of galaxies in each 
photo-z bin can be determined from the data. Huterer 
et al. (2006) and Zhan & Knox (2005) use a method more 
similar to ours in this respect; i.e., they fix the total num- 
ber of galaxies in each photometric redshift bin and allow 
Na to vary. Huterer et al. (2006) find there is very little 
difference between the two approaches for the case of weak 
lensing. This may not be the case though for galaxy power 
spectra. 

2.3. Galaxy angular power spectrum with photo- z errors 

Using eq. 3, the observed angular power spectrum is 
related to the "true" angular power spectrum (i.e. without 

^We treat each pair of indices (a?) as a single index on a two- 
dimensional matrix for the power spectrum between redshift bins for 
each galaxy sub-population. 



photo-z errors) by, 

C(ai){bj){f) 



(10) 



where bars denote quantities averaged over a redshift bin 
and P^ai {£) is the power spectrum of the normalized den- 
sity fluctuations SN^/N"^ (while C(^ai){bj}{C) tfie power 
spectrum of SN^-). We also add shot noise to the model 
for the observed power spectrum, 



where A, 



survey 



{ai)(bj) '^(ai)(bj) a ' 
^survey 

is the angular area of the survey in square 
arcminutes and 5^ is the Kronecker delta function. We 
plot fiducial power spectra given by eq. 10 in fig. 1 along 
with the variation of the power spectra when one of the 
parameters N"-^ is changed by 1-a. 

We use the halo model to obtain analytic expressions 
for the linear galaxy bias and nonlinear 3-D galaxy power 
spectrum as described in the appendix. We then use the 
Limber approximation (Limber, 1953) to project this into 
the binned angular galaxy power spectrum, 

^2 



(x) 



where xi^) is the comoving angular diameter distance as 
a function of redshift, A^(fc) is the 3-D variance of galaxy 
number density fluctuations per logarithmic interval in k, 
and 

dx 

is the probability distribution for finding a galaxy of type 
a in z-bin a at a comoving distance x in the survey, in the 
absence of photo-z errors, with ipa (x) a top-hat window 
function for the z-bin a, as defined in eq. 2, and normal- 
ization / W°(x) dx = 1. To simplify the computation, we 
ignore the redshift evolution of A^(fc,z) in eq. 11; evalu- 
ating it instead at the mean redshift of the bin. 

Note the delta- function in eq. 11 so that our model ne- 
glects any intrinsic cross-correlation between redshift bins. 
For £ = 50 (near the peak of the angular power spectrum) 
and redshift bins with width of 0.5, we have checked that 
the cross-correlations between bins are less than 1% of the 
auto-correlations. However, with our fiducial model, the 
catastrophic photo-z errors induce correlations at the level 
of ~ 5% of the auto-correlation (see fig. 1, left panel). We 
therefore expect that neglecting intrinsic cross-correlations 
should not alter our Fisher matrix errors by more than 
~ 1%, which is completely unimportant for the conclu- 
sions we draw here. 

An important effect that we have neglected here is mag- 
nification of galaxies by weak lensing from intervening 
large-scale structure (Turner, 1980; Turner et al., 1984). 
Lensing can induce correlations between different photo- 
z bins beyond those that would be found in an unlcnscd 
model. The amplitude of the weak-lensing induced corre- 
lations depends on the luminosity function of the galaxies 
in the survey, but could be of similar magnitude to the 
photo-z error induced correlations (Villumsen, 1995). We 
expect this ciffcct can be calculated with sufficient accu- 
racy that residual uncertainties will not lead to significant 
confusion with photo-z error induced correlations. 
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Fig. 1. — Angular galaxy auto and cross power spectra for photo-z bin l in our fiducia]_ model. The points with error bars are the fiducial 
power spectra, while the lines are 1-a variations of the parameters ai2 = N12 or 021 = N21 (left) and aia = N13 or 031 = N31 (right). For 
clarity, we have shown only the cross power spectrum that varies the most when the parameters are varied, with the auto power spectra 
(fiducial points and variation) for reference. The l-cr variations of the parameters are from the Fisher matrix (see fig. 4 below). 



2.4. Galaxy bias parameterization 

If we restrict ourselves to linear theory, then the galaxy 
power spectrum factors into a product of the dark matter 
power spectrum times a constant (scale-independent but 
redshift dependent) bias: 



pgal,lin 

(aa)(b/3) 



(12) 



(aa)(6/3)^ 

At small scales, the bias 6° becomes scale-dependent and 
this factorization no longer holds. Therefore, when we 
include the parameters {&^} in our Fisher matrix analysis, 
we are approximating the galaxy power spectrum with the 
linear power spectrum. Below, we evaluate the effects of 
this assumption. 

2.5. Galaxy sub-populations 

We consider two ways of utilizing the multiband photo- 
metric data in a galaxy survey. On the one hand, we imag- 
ine assigning photo- 2's to each galaxy and binning them 
according to their photo- z's while ignoring any other fea- 
tures of the galaxies. We can then use the number counts 
in each bin along with our model for the power spectrum to 
constrain the true redshift distribution of all the galaxies 
in each bin. On the other hand, we can sort the galaxies 
according to their spectral (or morphological) type and re- 
peat the analysis (jointly) for each sub-population. From 
this larger data set, we can then infer constraints on the 
parameters Nia = J2a ^ia fo^" the full galaxy sample by 
adding the variances of the N"-^ (including cross-terms)^. 
We expect the N°-^ to be different for each sub-population 
not only because of different redshift distributions for each 
galaxy type, but also because of different photo-z error dis- 
tributions. 

While the shot noise increases by dividing the galaxy 
sample into sub-populations, it may be possible to im- 
prove the constraints on the parameters of the total galaxy 
sample for several reasons. First, we gain knowledge of the 
mean number density of each sub-group, which contributes 
to the term in eq. 8. Second, we gain extra information 

*This is equivalent to making a formal change of variables in the 
inverse Fisher matrix from TV? to TVj™ 



from the cross-correlation between the sub-groups, which 
is even more helpful if the redshift distribution of one of 
the types can be well-constrained on its own. In fact, the 
exposure times and bands for the survey could even be 
optimized for the best constrained galaxy sub-population. 
In the limit of exact redshifts of a sub-sample of galax- 
ies, Newman (2006) has shown that cross-correlating with 
the photo-2 sample can place significant constraints on the 
photo-2 errors. Third, a given galaxy sub-group may be 
more biased than the total galaxy sample and could there- 
fore have equivalent or greater S/N than the total sample 
even though the noise increases for the sub-samples. 

2.6. Mean redshift in each photo-z bin 

As discussed in the introduction, we would like to know 
if the constraints on the photo-z error parameters shown 
in fig. 4 will be sufficient to enable interesting constraints 
on dark energy parameters from weak lensing and baryon 
acoustic oscillation surveys. To properly address this ques- 
tion, we should forecast joint constraints on a suite of cos- 
mological and photo-z parameters with both galaxy and 
shear data. However, this analysis has already been done 
for the case of Gaussian photo-z errors (Zhan, 2006) and 
we choose not to repeat that effort here. Instead, we make 
contact with previous work by reducing the constraints on 
the Nf^ to constraints on the mean redshift in each photo-z 
bin, defined as, 

[dzzdNl/d^ 

' ~ JdzdNt/dz ■ ^ ' 

The errors on zf are extracted from the inverse Fisher 
matrix, 

V z )(ai)(b3) Qjqc QJSlc' V N ) (ckp)[ckf3Y 

(ckp).(ckf))' " " k'P' 

(14) 

where Fpf is the Fisher matrix in eq. 4. 

It has been shown in Huterer et al. (2006); Ma et al. 
(2006); Zhan & Knox (2005); Zhan (2006) that useful dark 
energy constraints require the error on the mean redshift in 
each photo-z bin to be constrained to ~ 0.003 near z ~ 1, 
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and increasing rapidly towards higher redshift. We there- 
fore use 0.003 as a benchmark for assessing our results. 

3. RESULTS 

In this section, we first describe a simple fiducial model 
for the galaxy survey and photo-^; errors, and then show 
the forecasted constraints from the Fisher matrix in eq. 7. 
Wc then study how our results depend on the fiducial 
model in order to draw conclusions independent of the 
many assumptions in our model. 

3.1. Fiducial model 

We choose our fiducial model to mimic the LSST^ . Through- 
out, wc consider a survey covering 20,000 sq. dcg. and bin 
the galaxies in photo-z over the range to 3. Our fiducial 
cosmological parameters are V,„i — 0.24, fi^/i^ = 0.022, 
= 0.76, h = 0.72, and = 0.74. 

To ensure that our assumption of Gaussianity in eq. 4 
is reasonable, wc limit the £ range in eq. 10 by adding 
large noise to each element of P{ai){bj) (^) with £ > £ina,x{z), 

where ^max(-2) = X(-2) fcmax(-2) and^ A^]y[(fcmax,-2) = 0.4. 

We also set a lower bound on £ to justify our use of the 
Limber approximation for the angular power spectrum in 
cq. 11. The Limber approximation relies on the observa- 
tion that, when projecting the 3-D power spectrum into 
2 dimensions, the dominant contribution is from those 
Fourier modes that do not oscillate significantly along the 
line of sight. Wc can quantify this statement by con- 
sidering only modes with line-of-sight component of the 
wavevector fcs < 2'k/Ax{z), where Ax{z) is the comov- 
ing width of the 2-bin under consideration. The Limber 
approximation also requires £ 3> k^x{z). Putting these 
together, we set £min{z) = 'inxiz) / Ax (with an arbitrary 
factor of 2 inserted just to be conservative). In table 1, we 
show the values of £ms,x, as a function of redshift in our 
fiducial model for 6 z-bins over the range < 2: < 3. We 
evaluate both ij^in and ^max at the centers of the photo- 2: 
bins. 

We use the redshift distribution from Song & Knox 
(2003) (which is based on Subaru observations with limit- 
ing magnitude in R of 26), 

dN , , / z \ 

VL2J 



dzdCl 



(z) = iVtot exp 



1.2 



,1.3 
4-1 



z < 1 
z > 1 



(15) 



with the normalization, A'tot set by / dzdN/dzdO, = 65 
per sq. arcmin. 

^www.lsst.org 

^ "DM" denotes the dark matter power spectrum 



Table 1 

multipole range for the galaxy power spectrum as 
a function of redshift 



z range £min{z) 4iax(z) 



0.0 - 


0.5 


7 


114 


0.5 - 


1.0 


23 


458 


1.0 - 


1.5 


45 


1018 


1.5 - 


2.0 


71 


1875 


2.0 - 


2.5 


103 


3195 


2.5 - 


3.0 


140 


5186 



When considering galaxy sub-populations, we consider 
two spectral types that we label "red" and "blue," roughly 
depending on the absence or presence of active star forma- 
tion. In the absence of a well-motivated model, we gen- 
erated several redshift distributions for the red and blue 
sub-populations in an ad-hoc fashion, and compared the 
results between them. We require only that the redshift 
distributions sum to give eq. 15 and that the blue distri- 
bution dominate at large redshifts {z > 1). 

For the red and blue galaxy sub-populations, we set 

J TU-red 

RNtotz'-^ e^pi-nz"-'] (16) 



- and 



dz dQ 



dz cZfi dz 
1.3, r2 = 1.4. 



with R = 0.8, ri 
distributions are shown in fig. 2 
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Fig. 2. — Fiducial redshift distributions for the "red" and "blue" 
galaxy sub-populations (eq. 16) and the total galaxy sample ("to- 
tal," eq. 15). 

Our fiducial model for N?^ is based on the estimated 
photo- 2;'s for a simulated random sample of 100,000 galax- 
ies over redshifts from to 4 with colors assigned by fil- 
tering spectra from a sample of 10 redshift-evolved spec- 
tral energy distributions (SEDs). The simulation assumed 
photometric data was available in 6 filters {ugrizy), mod- 
elled after the LSST, with the data limited in the i-band 
at I < 25 and an S/N of 10-15 at the depth of the survey. 
The depth of the simulation is what one would achieve 
after about 400 visits per filter. This is just a fiducial ap- 
proach as the errors can be optimized by weighting the 
exposure times in each band separately. The photo-z of 
each galaxy was (;stimat(;d by matching the galaxy colors 
with a SED template library. No priors on the luminosity 
function or surface brightness were used, which can signif- 
icantly improve the photo-2 estimates in some cases. In 
this regard, our fiducial model is therefore a worst-case 
scenario. 

To model the errors for the galaxy sub-populations, we 
divided the templates for the simulated galaxy SEDs into 2 
groups based on the presence or absence of strong emission 
lines. 
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The fiducial parameters, Nf^, are constructed by first 
creating the matrix, Ef^ = Nf^/N^, by binning the photo- 
z vs. z plane and normalizing so that Ef^ = 1 (for each 
a and a). This normalization conserves the total number 
of galaxies in the survey. We then use the fiducial redshift 
distribution in eq. 15 to create according to eq. 2 and 
multiply with Ef^ to get the parameters N^^. An example 
of our fiducial model for the Nf^ is plotted in fig. 3. 

3.2. Parameter constraints 

Our main results are in fig. 4, which shows the Fisher 
constraints on the parameters Nia assuming a 10% prior 
on the galaxy bias and 100% prior on the Nf^. The model 
for the open squares includes the "red" and "blue" galaxy 
sub-populations, while the model for the filled squares ig- 
nores this information. 

In the column 2 of table 2, we show the constraints on 
the mean redshift in each photo-z bin (eq. 14) implied by 
the constraints on the N^^ without galaxy sub-populations 
(filled squares in fig. 4). These are two orders of magni- 
tude larger than the "benchmark" value (described in sec- 
tion 2.6) needed for constraining dark energy parameters. 
In the following subsections, we discuss three ways that 
the constraints on Zi could possibly be improved. 

3.2.1. Adding galaxy sub-populations 

For a wide range of fiducial models, we find that dividing 
our galaxy sample into "red" and "blue" sub-populations 
improves the forecasted constraints on the redshift distri- 
bution in each photo-z bin, as shown by the open squares 
in fig. 4 and column 3 of table 2. We also show fore- 
casted constraints on the redshift distributions of the sub- 
populations in fig. 5. Comparing the errors on the mean 
redshift in each photo-z bin in table 2, we sec that there 
is a significant improvement over the constraints obtained 
without using information about galaxy sub-populations, 
but is still much larger than the "benchmark" for dark 
energy surveys given in section 2.6 

Because our fiducial models for the biases and redshift 
distributions of the sub-populations (in the absence of 
photo-z errors) are rather ad- hoc, we have varied the pa- 
rameters in eqs. 16 and A2 over a range of physically rea- 
sonable values and find no change to the qualitative nature 
of our results. This is discussed more in section 3.3.2. 

3.2.2. Sensitivity to parameterization 

We show in the left panel of fig. 6 and column 3 of ta- 
ble 2 the forecasted parameter constraints with a fiducial 
photo-z error model that only allows mixing between ad- 
jacent photo-z bins and with the number of photo-z error 

Table 2 

Constraints on the mean redshift of each piioto-2 bin 



z range 


clsst 


0"sub 


f^Gauss 




0.0 - 


0.5 


0.29 


0.059 


0.0050 


0.0030 


0.5 - 


1.0 


0.068 


0.041 


0.010 


0.0050 


1.0 - 


1.5 


0.12 


0.036 


0.0087 


0.0096 


1.5 - 


2.0 


0.16 


0.049 


0.0091 


0.016 


2.0 - 


2.5 


0.23 


0.085 


0.0079 


0.022 


2.5 - 


3.0 


0.28 


0.61 


0.0047 


0.025 



parameters reduced to only those that can take nonzero 
values in the fiducial model. The fiducial errors assume a 
5% contribution from each of the adjacent bins to a given 
photo-z bin. This crudely mimics a Gaussian model for 
the photo-z errors. We have also tightened the prior on 
the galaxy bias from 10% to 1%. While the constraints 
on the Nf^ in the left panel of fig. 6 are moderately im- 
proved from the default model in fig. 4, the constraints on 
Zi in table 2 improve by nearly two orders of magnitude. 
This shows that reducing the number of free parameters in 
each photo-z bin indeed has a large impact on the ability 
to constrain the redshift distribution in each bin. 

3.2.3. Sensitivity to range of angular scales 

The forecasted constraints are very sensitive to the max- 
imum C. used in the galaxy power spectrum (but are rather 
insensitive to the minimum £ cutoff). In particular, for the 
lowest photo-z bin, the maximum cutofi^ at £ = 114 (from 
table 1) removes some of the baryon features in the power 
spectrum that can help in diagnosing photo-z errors (Zhan, 
2006). 

To demonstrate this sensitivity, we show the forecasted 
constraints when imaxiz) = 4000 for all z in the right 
panel of fig. 6 and column 5 of table 2. The constraints 
on the mean redshift in each photo-z bin are two orders 
of magnitude smaller than those with ^max from table 1 
(labelled ctlsst in table 2). 

Recall that the maximum i cutoff is imposed to validate 
our assumption of Gaussian data (in e.g. eq. 4). Therefore, 
the constraints presented in this section should be inter- 
preted only up to non-Gaussian corrections, which could 
be quite large. Our results are an indication that there is 
much to be gained by developing the appropriate tools for 
analyzing the non-Gaussian case. 

3.3. Fiducial model dependence 

To test the robustness of our conclusions, we recom- 
puted the Fisher matrix in eq. 7 while varying the number 
of redshift bins, the galaxy bias and halo occupation distri- 
bution in the nonlinear power spectrum, and the fiducial 
model for the photo-z errors. 

3.3.1. Redshift distribution 

For comparison, we use a second fiducial model for E""^ 
with 10% of the galaxies in each photo-z bin uniformly 
distributed over the remaining photo-z bins. This model, 
though not physically motivated, gives us a reference for 
determining the sensitivity of the Fisher constraints to the 
fiducial error model. 

We show ratios of the Fisher matrix errors obtained with 
these two fiducial photo-z error models in fig. 7. For most 
of the parameters, the different fiducial models lead to dif- 
ferences in the forecasted errors of a factor of < 5. The 
LSST fiducial model for the photo-z errors is actually well- 
motivated by our photo-z estimation simulation so any un- 
certainties in the fiducial model will be much smaller than 
the changes introduced by this artificial "uniform" error 
model. We therefore conclude that uncertainties in the 
fiducial photo-z errors will affect our forecasted constraints 
by factors of < a few. 



Schneider, Knox, Zhan & Connolly 



7 




z z z 



Fig. 3. — The fiducial redshift distribution with catastrophic photo-z errors: Nf^ (for the "LSST" model - see text). The two plots show 
the same parameters in different perspectives. On the left, is the number density, Nf^, as a function of redshift (indexed by i = photometric 
z and a = spectroscopic z). On the right, each window shows a different photo-z bin index, i, while each bar in a given window shows a 
different spectroscopic index, a, for the given i. For example, the bar between z = 0.5 and z = 1 in the window for "photo-z bin 1" is the 
number density of galaxies with spectroscopic redshifts in the range 0.5 < 2 < 1 that have been given photometric redshifts in the range 
0<z< 0.5. 
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Fig. 4. — Forecasted constraints for the parameters iVj„ divided by the fiducial Ni in each photo-z bin, i.e. the error on the fraction of 
outliers within each photo-z bin. The filled squares are the fractional constraints when no galaxy sub-populations are considered, while the 
open squares are the constraints when the galaxy sample is divided into "red" and "blue" spectral types. For the open squares, the parameters 
for the "red" and "blue" sub-populations {Nf^) were constrained first, and then these constraints were combined to produce the constraints 
on Nioi shown here (by summing over the index a in the inverse Fisher matrix components). A 10% prior on the galaxy bias and a 100% 
prior on the Nf^ were imposed. The fiducial model assumed "LSST" photo-z errors (see text) and sky coverage of 20,000 sq. deg. 
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Fig. 5. — Forecasted fractional constraints on Af°^ for the "red" (squares) and "blue" (circles) galaxy sub-populations. These constraints 
take into account the cross-correlation between the red and blue samples. We used a 10% prior on the linear galaxy bias and a 100% prior 
on the iV? . 
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Fig. 6. — Demonstration of two factors that significantly affect the errors on the mean redshift in each photo-z bin (see table 2). For the 
filled squares on the left, we have chosen a fiducial photo-z error model that only introduces mixing between adjacent bins and have fixed the 
parameters that are known to be zero in this model. This roughly mimics a Gaussian photo-z error distribution. For the filled squares on 
the right, we have replaced the z-dependent I cutoff from table 1 with a maximum I = 4000 for all the photo-z bins. For reference, the open 
squares in both plots show the constraints with the default fiducial model without galaxy sub-populations. The left panel assumes a 1% prior 
on the linear galaxy bias, while right panel assumes a 10% prior. 
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Fig. 7. — Ratios of forecasted errors for the parameters Ni^ 
using 2 different fiducial models for the photo-z errors. We first 
scale the forecasted errors by the fiducial Ni in each photo-z bin 
(to obtain fractional errors) and then divide the errors obtained 
using a uniform distribution of 10% scatter from each photo-z bin 
by the forecasted errors obtained using a photo-z error model based 
on the LSST (see text). Each line shows a different photo-z bin 
(corresponding to the index i in the parameters). 



3.3.2. Galaxy bias and nonlinear power spectrum 

The results we have shown so far model the galaxy power 
spectrum using only the linear theory prediction. To make 
sure that this approximation will not affect the qualitative 
nature of our results, we compare in fig. 8 the forecasted 
errors for the photo-z error distribution obtained using the 
linear theory power spectrum to those obtained with the 
nonlinear model (see the appendix) . We see that our use of 
the linear power spectrum is a good approximation, which 
is to be expected given our truncation in i described at 
the beginning of section 3.1. 




Fig. 8. — Ratios of forecasted constraints for the parameters Nia 
using the nonlinear power spectrum divided by the constraints using 
the linear theory power spectrum. The fiducial redshift distribution 
and photo-z errors are the same in each case. Each line shows a dif- 
ferent photo-z bin (corresponding to the index i in the parameters). 

The fiducial linear galaxy bias and nonlinear galaxy 
power spectrum depend on a model for the way that galax- 
ies populate dark matter halos. The details of our fiducial 
model are explained in the appendix, section A.l, but this 



model has only been very loosely constrained by observa- 
tions (Cooray, 2006; Abazajian et al., 2005). Therefore to 
build confidence in our use of a necessarily ad hoc fidu- 
cial model, we compared the parameter constraints from 
the Fisher matrix when we vary the ad hoc parameters. 
We find that when the "satellite" galaxy normalization 
and slope (defined in eq. A2) are varied by 50% and 25% 
respectively, the changes in the photo-z error parameter 
constraints are less than 10% and 0.01% for all the red- 
shift bins. This amount of variation will not affect our 
conclusions. 

3.3.3. Number of photo-z bins 

We have compared the fractional constraints on the photo- 
z error parameters when the number of photo-z bins is 
varied from 2 to 10 and do not find a significant variation. 
As the number of photo-z bins is increased, there is more 
information about the photo-z errors from the additional 
cross-correlations between photo-z bins, but the number of 
parameters to constrain also increases. So, the fractional 
constraints we show here for six bins should be represen- 
tative of the constraints that would be obtained for any 
moderate number of bins. Note, however, that having the 
same fractional constraints for a larger number of param- 
eters means we have more information about the photo-z 
error distribution with photo-z bins. Of course, for a suf- 
ficiently large number of photo-z bins, the shot noise will 
begin to dominate. 

4. DISCUSSION AND CONCLUSIONS 

We have shown that the ability to constrain general 
(i.e. non-Gaussian) photo-z error distributions with galaxy 
two-point correlation functions depends on the parameter- 
ization of the photo-z errors, the range of angular scales 
probed by the correlation function, and prior knowledge 
of the galaxy bias. 

Binning the galaxy sample in photo-z, we have presented 
constraints on the binned redshift distribution and mean 
redshift in each photo-z bin. Parameterizing the redshift 
distribution by binned values is insensitive to small scat- 
ter from photo-z errors, but otherwise assumes no a priori 
knowledge of the photo-z error distribution. We find that 
reducing the number of parameters in each photo-z bin can 
be very helpful, which could be achieved with improved 
knowledge of the photo-z errors from, e.g., spectroscopi- 
cally calibrated samples or luminosity function priors. 

We have limited our use of the galaxy correlation func- 
tion to angular scales where the galaxy number density is 
Gaussian distributed. At low redshifts, this severely lim- 
its the amount of data available to constrain the photo-z 
error parameters. We hypothesize that including informa- 
tion from correlations on non-Gaussian scales could sig- 
nificantly improve the constraints and demonstrate that 
the constraints on the mean redshift in each photo-z bin 
do improve by two orders of magnitude with a naive ex- 
trapolation of our Gaussian calculation to non-Gaussian 
scales. 

If it is possible to separate the galaxies by spectral type, 
the constraints on the photo-z errors may improve further 
by including information from the cross-correlation of the 
galaxy sub-samples. We have demonstrated this in figs. 4 
and 5 by separating our fiducial galaxy sample into "red" 
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and "blue" spectral types. We expect this procedure to be 
particularly helpful if there exists a well-populated spectral 
class of galaxies whose photo- ^;'s can be estimated unusu- 
ally well. 

Our forecasts arc limited to parameters of the photo-z 
error distribution and linear galaxy bias so we cannot make 
any rigorous conclusions about what kind of dark energy 
constraints can be achieved in weak lensing and baryon 
acoustic oscillation surveys with the level of photo-z errors 
forecasted here. However, wc make qualitative compar- 
isons with dark energy forecasts in the literature (Huterer 
et al., 2006; Ma et al., 2006; Zhan & Knox, 2005; Zhan, 
2006) using our constraints on the mean redshift in each 
photo-z bin given in table 2. In the Gaussian regime, 
the constraints we forecast of ^ 0.01 are factors of a few 
larger than those desired for upcoming dark energy sur- 
veys. However, adding non-Gaussian scales in the corre- 
lation function may provide the required constraints. The 
galaxy correlation properties are quite likely to provide at 
least a powerful consistency test for the redshift distribu- 
tions as determined via spectroscopic and/or "super" (12 
or more band) calibration subsamples. 

We thank M. Auger, G. Bernstein, D. Huterer, D. Koo, 
J. Newman, and J. A. Tyson for useful conversations . 
This work was supported in part by NSF grant 0307961. 

APPENDIX A 

HALO MODEL 

The halo model (for a review see Cooray & Sheth, 2002) 
provides an analytic approximation to the nonlinear three- 
dimensional galaxy power spectrum using the assumptions 
that all the dark matter is contained in gravitationally 
bound "halos" of varying mass and that the number of 
galaxies populating a given dark matter halo is determined 
solely by the halo mass and redshift. The model for the 
number of galaxies in a dark matter halo is often referred 
to as the "halo occupation distribution" (HOD). 

A.l. Fiducial HOD m,odels 

Following Hu & Jain (2004), we divide the mean number 
of galaxies in a dark matter halo of mass m into contribu- 
tions from a galaxy at the halo's center (Nc) and satellite 
galaxies (Ng). The mean number of central galaxies is es- 
sentially a unit step function parameterized by a minimum 
threshold mass, mth{z), for a halo to host a galaxy. To al- 
low for scatter in the relation between galaxy luminosity 
and halo mass, the simple step function is modified to, 

^."(m,.) = lr(m,.)Erfc( ^°g(";;^(;)/-) ), (Al) 

where /"(m, z) is the fraction of central galaxies of spectral 
type a. We use eq. 7 in Cooray (2006) as our fiducial model 
for f°-{m,z) and set a = 0.1. 

The mean number of satellite galaxies is modelled as a 
power law, 

N,{m,z)^(-^^) , (A2) 

with the two free parameters ^ ~ 30, fe ~ 1 (Hu & Jain, 
2004). 



The threshold mass, mth, is determined by requiring the 
HOD model to reproduce the fiducial redshift distribution 
as follows: 

-{z) = x^{z)^ J dmn{m,z) {Nc{m, z) + Ns{m, z)) 



dzdO, 



2/ _ , . 



(A3) 



where xiz) is the comoving distance as a function of red- 
shift and n(m, z) is the halo mass function (we use the 
Sheth- Tormen model for n(m, z) (Sheth & Tormen, 1999)). 

A. 2. Galaxy power spectrum 

The power spectrum of galaxies in the halo model is the 
sum of two terms: 

Pg{k,Zi,Z2) = Pih{k, zi, Z2) + P2h{k, zi, Z2), 

where, 

Pih{k,zi,Z2) = -2(''^\ [ dmn{m,z) 

X {N^{m, zi) ul{k\zi,m) + 2iVc(m, zi)Ns{m, zi)ug{k\zi, m)) 
is the contribution to the power from a single halo^, and 
P2h{k, zi,Z2) = P""(A:, zi,Z2) hik, zi) hik, Z2) (A4) 
with, 

1 



h{k,z) 



ng{z) 



dm n{m, z) hh{m, z) 



X (iVc(™, -z) + A^s(to, -z)ug(fc|z, to)) 

is the contribution from 2 different halos. Here, Ug{k\z, m) 
is the Fourier transform of the galaxy number density pro- 
file (assumed to follow the NFW profile (Navarro et al., 
1996)), bh{m,z) is the halo bias (specified in the Sheth- 
Tormen model along with the mass function) and ng{z) is 
the mean comoving number density of galaxies defined in 
eq. A3. 

A. 3. Linear power spectrum 

The linear power spectrum in eq. A4 is the variance per 
logarithmic interval in k, in linear perturbation theory 



P""(fc,Zi,Z2) = ^^(0)5(^l)5(Z2)(^ 



n+3 



T^{k) 



where 5h{^) is the amplitude at z = 0, g{z) is the linear 

growth function, T{k) is the transfer function, and Hq is 
the Hubble constant. We take a fiducial value of n = 1. 

A. 4. Linear galaxy bias 
The galaxy bias in the halo model is given by, 

[N^ + N-) (m,z„) 



/ 



b"" {za) = / dmn{m,Za)bh{m,Za)- 



nniZa) 



(A5) 

where the superscript (a) labeling different galaxy sub- 
populations denotes different values of the parameters A 
and b in eq. A2 and rrith- 
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